Ranking Web Pages Using Collective Knowledge

نویسندگان

  • Falah Hassan Al-akashi
  • Diana Inkpen
چکیده

Indexing is a crucial technique for dealing with the massive amount of data present on the web. Indexing can be performed based on words or on phrases. Our approach aims to efficiently index web documents by employing a hybrid technique in which web documents are indexed in such a way that knowledge available in the Wikipedia and in meta-content is efficiently used. Our preliminary experiments on the TREC dataset have shown that our indexing scheme is a robust and efficient method for both indexing and for retrieving relevant web pages. We ranked term queries in different ways, depending if they were found in Wikipedia pages or not. This paper presents our preliminary algorithm and experiments for the ad-hoc and diversity tasks of the TREC 2011 Web track. We ran our system on the subset B (50 million web documents) from the ClueWeb09 dataset. Categories and Subject Description Web Information Retrieval: Content Analysis, Indexing, and Ranking

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Efficient Methodologies to Handle Hanging Pages Using Virtual Node

In this paper we first explain the Knowledge Extraction (KE) process from World Wide Web (WWW) using Search engines. Then we explore the PageRank algorithm of Google Search engine (one of the famous link based search engine) with its hidden Markov analysis. In that we also explore one of the problems of Link based ranking algorithms called hanging pages or dangling pages (pages without any forw...

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

Modeling and Leveraging Social Collective Intelligence

The rise of social interactions on the Web requires developing new methods of information organization and discovery. To that end, we propose a generative community-based probabilistic tagging model that can automatically uncover communities of users and their associated tags. We experimentally validate the quality of the discovered communities over the social bookmarking system Delicious. In c...

متن کامل

A semantic self-organising webpage-ranking algorithm using computational geometry across different knowledge domains

In this paper we introduce a method for Web page-ranking, based on computational geometry to evaluate and test by examples, order relationships among web pages belonging to different knowledge domains. The goal is, through an organising procedure, to learn from these examples a real-valued ranking function that induces ranking via a convexity feature. We consider the problem of self-organising ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011